In summary,
Reference: Scalability: strong and weak scaling
In HPC, the network model is MPI-libfabrics-NIC.
OpenFabrics Interfaces (OFI) is a framework focused on exporting fabric communication services to applications. Libfabric is a core component of OFI.
OpenFabrics Enterprise Distribution (OFED).
Mellanox OFED (MOFED) is Mellanox’s implementation of the OFED libraries and kernel modules.
InfiniBand refers to two distinct things. The first is a physical link-layer protocol for InfiniBand networks. The second is a higher level programming API called the InfiniBand Verbs API. The InfiniBand Verbs API is an implementation of a remote direct memory access (RDMA) technology.
Reference: CHAPTER 13. CONFIGURE INFINIBAND AND RDMA NETWORKS Libfabric supports a variety of high-performance fabrics and networking hardware. It will run over standard TCP and UDP networks, high performance fabrics such as Omni-Path Architecture, InfiniBand, Cray GNI, Blue Gene Architecture, iWarp RDMA Ethernet, RDMA over Converged Ethernet (RoCE).
IB is high-performance because of no kernel involvement (hence, user-level) for operations that involve transmission/reception of data, unlike TCP/IP. The kernel is involved only in the creation of resources used for issuing data transmission/reception. Additionally, unlike TCP/IP, the InfiniBand interface permits RDMA operations (remote reads, writes, atomics, etc.).
libibverbs
is the software component (Verbs API) of the
IB interface. As sockets
is to TCP/IP,
libibverbs
is to IB.
The hardware component of IB is where different vendors come into play. The IB interface is abstract; hence, multiple vendors can have different implementations of the IB specification.
Mellanox Technologies has been an active, prominent InfiniBand
hardware vendor. In addition to meeting the IB hardware specifications
in the NIC design, the vendors have to support the
libibverbs
API by providing a user-space driver and a
kernel-space driver that actually do the work (of setting up resources
on the NIC) when a libibverbs
function such as
ibv_open_device
is called. These vendor-specific libraries
and kernel modules are a standard part of the OFED. The vendor-specific
user-space libraries are called providers in rdma-core. Mellanox OFED
(MOFED) is Mellanox’s implementation of the OFED libraries and kernel
modules. MOFED contains certain optimizations that are targeted towards
Mellanox hardware (the mlx4 and mlx5 providers) but haven’t been
incorporated into OFED yet.
Alongside InfiniBand, several other user-level networking interfaces
exist. Typically they are proprietary and vendor-specific. Cray has the
uGNI interface, Intel Omni-Path has PSM2, Cisco usNIC, etc. The
underlying concepts (message queues, completion queues, registered
memory, etc.) between the different interfaces are similar with certain
differences in capabilities and semantics. The Open Fabrics Interface
(OFI) intends to unify all of the available interfaces by providing an
abstract API: libfabric
. Each vendor will then support the
OFI through its libfabric-provider
that will call
corresponding functions in its own interface. libfabric
is
fairly recent API and intends to serve a level of abstraction higher
than that of libibverbs
.
Reference: For the RDMA novice: libfabric, libibverbs, InfiniBand, OFED, MOFED?
Fabric control in Intel MPI reports that Intel MPI 2019 has issues with AMD processors. This is confirmed by Intel.
/etc/security/limits.conf
and
/etc/security/limits.d/20-nproc.conf
sets the limit of
processes.
Set up Message Passing Interface for HPC gives some commands for running MPI.
Unified Communication X (UCX) is a framework of communication APIs for HPC. It is optimized for MPI communication over InfiniBand and works with many MPI implementations such as OpenMPI and MPICH.
For now, I still cannot understand what UCX is. Check UCX FAQ
Some relevant issues on the Intel forum:
MPI_IRECV sporadically hangs for Intel 2019/2020 but not Intel 2018
Intel MPI update 7 on Mellanox IB causes mpi processes to hang
MPI myrank debugging Other Intel articles
Improve Performance and Stability with Intel® MPI Library on InfiniBand
Check OpenMPI FAQ